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ABSTRACT 

\ This handbook for training program evaluators . 

proyi^es background information and practice activities in jche 
-following areas: (1) measurement: purposes, ideals, possibilities; 
(2) defining ireasurement domains;. (3) person and item sampling; (4) 
test and item selection — with a selected list of standardized tests 
with pertinent information on each, suggestions for writing objective 
test, items, and formulae for item analysis; . and (5) objecti've 
obsearvation. It is recommended that each group in training choose one 
of the four sample situations ^described and use it throughout the 
sessions.. The situations are: (1) prekind«rgarten program for 
disadvantaged children; (2) introduction of teacher aides 
(elementary) ; (3) individualizing instruction through computer-based 
resource units (high school); and («l) improving interracial attitudes 
and knowledge. (KM) 
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^ FOREWORD 

The increased competition for the tax dollar has cauaed and 
will continue to cause more rigorous evaluations in all fields of ^ 
education, particulitrly at the Federal level. Increasingly, legislators 
and their constituent taxpayers arB demanding hard data which will 
indicate whether a costly program is achieving' that which it has pur- 
f^orted to achieve. Under these c<^nditions evaluation at all levels must 
satisfy the criteria elements of significance, ^credibility, and timeliness 
Within this framework evaluative techniques must be strengthened. 

Appropriate departmental personnel believed that strengthening 
the evaluative effort- of * the State might start witli cate'gorica** ly aid 
■projects at the elementary and secondary education level. 

Appropriate people from within the State were asked to prepare 
and conduct formal lessons accompanied by simulated experiences arid 
"related materials.. Thus this document is one in a series of review 
manuals to be used by^ appropriate local education. The contents of the^ 
series are appropriate for use with large program evaluative problems 
such as 'jhose encountered in ESEA, Urban. Education, or the like. 

This document on Maasurment was prepared by S. Dayid Farr and 
Michael J. Subkoviak, State University of New York at Buffalo. 
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. . TITLE III EVALUATORo TRAINING 

IffiASUREMENT 

Organization 

Objectives: Learn,, practice, share insi'ghts and examples 

Number of Units: 5 

Time per Unit:- 90 minutes 

Time Within Units (Approximate) : 

30 minutes - Lecture 

30 minutes - Student work 

30 minutes - Reporting and discussion 

Formation of Groups for Student Work: 

^Initial assignments made by instructor - changes permit;ted 

Choice of Sample Situations:' Recommended that each group 
choose a situation and use it throughout ^sessions 

Reporting Student Work: , ^ 

1) Rotate recorder within groyp (arbitrary assignments by 
instructors may be varied) 

2) Recorder will' keep notes in black ink (peri provided) 
for reproduction . ^ 

3) Recorder will also serve as reporter 
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measuremeSt: purposes, ideals, possibilities ' 

AlthoAigh many pecyple think of measurement only in terms 6f the student 
outcomes which an experimental program at;tempts to change, a much broader view 
is desirable* Outcomes are only ^ ae of three classes of observations, ani 
student* outcomes are a subset -of that class* The setting of any study should 
he carefully described so that others may interpret results in- terms of their 
- own situation. This may include measures on the conmunity, the teachers and 
. other school personnel, the physical facilities, thq students' initial 
abilities, and the society in general* In addition, jsl clear description of the 
nature of the experimental program is essential. Usually, ouly through observa- 
tion or other measurement procedures, can the program be described as it really 
happened. Proof that specified "treatments" were really administered to the 
students and a record of resulting changes in classroom behavior are the only 
adequate description of the prograni. VieVed broadly, then, the measurement' plan 
for an innovative program should include measures on the settings the process of 
the program, and its outcomes. Adequate attention to these t^hree classes of 
measurements requires a serious committment to the measurement effort. 

Two general ideals apply no matter what is being measured or how the 
measurements are made. These are meaning fulness and precision. 'To make mean-* 
ingful interpretations of measurements, the* tasks assigned, the method of ^ 
observing, and the way scores are formed must follow a logical system. Simplic- 
ity and directness often are helpful in producing meaningful scores. Precision- 
deals with xahether^a measurement, when replicated, will produce the same result. 
Precision. of individual measurements is less important i^en an aggregate, for 
example a student body, is the object to be measured. 
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> TITLE III EVALUATORS TRAINING 

MEASUREMENT UNIT 1 \ 

Measurement; Rurposes, Ideals, Possibilities 



Purposes : Accurate description of 

A. Setting ^ 

B. Treatment (Process) 
C« Outcomes 



II. Ideals * 

A. Meaningfulness 

B. Accuracy, or precision 
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III. Possibilities 

A. Setting ' • / 

1. ifl6asurenient on community / \ 

2. teachers ^ ' ' • , 

3. other personner - administrators, aides 

4. physical facilities 
' ; 5.. chiXdren 

6.\ historical events 
' B. Treatment ' 

1. measurement of specified treatment drtalls ^ 
, - 2. , nonspecified general description 
, 3,. other jroutine observations 
4. use of facilities 
C. Outcomes - , ^* 

1. , student behavior 

2. teacher behavior . " \ . 

3. auxiliary per sounel ^- 

4. parent Mor community ' * . 

5. delayed effects, persistence bf observed effects 



IV. Summary 

A\ * Multiple purposes and variables 
B. Meaningfulness and accuracy 



References ; 



R. W.- Tyrer, R. M. GagAe and M. Scriven Perspectives on 
Curriculum Education . No. 1, AERA Monograph Series on 
Curriculum Evaluation. Chicago: Rand McNally, 1967.' 



ERLC 



/ 



-4- 



UNIT 1 ACTIVITIES 



Using the assigned or seTeFted sample situation, plan 
a comprehensjive measurement (information gathering) pro^rafa 
for that project, ^nce* descriptions of the sample situation 
ar;^ necessarily sketchy you may .assume reasonable additions to 
and specification of the objectives arid procedures. 

The primary task is to specify iShat you wish to observe 
on assess and when. Do not be concerned about exactly how traits 
or abilities will be assessed or behaviors .observed. 

Assemble your decisions in the form of 'a rough chronology 
of observation or data collection. ^» • . 

' . Do not hesitate to set up a more extensive plan than ' 
practical considerations will allow. Such a plan can always be 
pared down later. 



Time: 30 minutes 



The reportjsr will present a 5-10 minute sumpiary to the gro^. 
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DEFINING MEASUREMENT DOMAINS 



Once a decision to measure a certain variable has been reached, the prob- 
lem of how- the measurement is to be made must be faced, 'It is easy to ta^k * 
about achievement, anxiety, valuation, or attitudes, but it is another thing to 
measure them. A promising approach is to defiae a domain of tasks, observations, 
and conditions relevant to the specified variable. Since these domains are 
usually very large, measurements are made by sampling some of the elements. By' 
/knowing the extent and boundaries of the domain, however, the meaningfulness of 
measurements l>ased on such samples m^. be intelligently assessed. 

This approach to measurement associated with a theory of general- 
izabilityj proposed by Cronbach and others (1963). The task is stated a^ H^efining 
a domain of ''conditions*' where the word conditions has a yery general meaning. It 

includes, for example^ different elements jf content. The problem 2 + 2 = , 

and th%. problem 3 5 « are two different conditions for^observin^ arithmetic 

skill. Similarly, a free response ^d a- multiple choice item based on the same 
Content would represent^if ferent conditions. In addition^ to such fdnqal' variations , 
Conditions may vary temporally. For example, a domain may include delayed 
measures or only those taken at the close of the program. Variation in the ^ 
situation in which obsexrvations are made is a more familiar use of ••conditions " 
. but is only one of many meanings assigned to the word in this conception. * • 

There are no established procedures for defining measurement domains. .Both 
listing of inclined and excluded elements, and stating rules for inclusion or 
exclusion would seem useful. In practice, ^ome combination of these ^two techniques 
is otten most effective. * . . 



1 



1 



•6- 



TITLE III EVALUATORS TRAINING ^ 
MEASUREMENT UNIT 2 



Defining Measurement Domains 

I. " Problem: Specifying what we wish to measure 

A. Rational constructs - achievement, anxiety, valuation 

B. Range of indicators must be defined ^ 

\ 

II, Domain Definition: Specification of "conditions'^ Included 
in domain 

A. Definition of conditions: Any aspect of the observation 
or its setting which may vary 

B. Domain score: Mean score over all observations included 
in domain (percent correct if 1-0 scoring) 

C. Factors of domain definition 

1. entity to be measured - persons vs, classes or other 
aggregates ' 

2. content 

3. foi:mal 

4. temporal , * * /- 

5. ^^^server 

6« situational 



111* Techniques of Domain Definition * 
A^ Listing included conditions 
' B» Rules for inclusion and exclusion 
C* Spiral use of listings and rulee 



IV, Summary 

A. ^ Concept of domain of conditions 

B. Diverse ways conditions may vary 

C. Need for precise and complete specification 



References: * 

B-. S. Bloom, et. al . Taxonomy of Educational Objectives: 

Cognitive Domain > New York: Longmans, Green & Co., 1956. • 

L. J.- Cronbach, et. al. Theory of Generalizability; . Brit . J. 
Statistical Psychol, XVI Part II, 137-163, 1963/ 

D. Krat^iwohl, et. al. Taxonomy of EJucational Objectives: 
Affective Domain , New York: David McKay, 1964. 
IS Michael Scriven. The Methodology of Evaluation, in Perspectives 
of Curriculum Evaluatioif ^ AERA Monograph Series on 
Curriculum Evaluation, No. 1. Chicago: Rand *McNally, 1967. 
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UNIT 2 ACTIVITIES 



Using the selected sample situation, quickly select two 
of the traits, performances, behaviors, etc, that you wish to' 
assess. Select: 

1) one very clearly defined maximum performance 
domain, e.g. a skill or achievement 

2) one ' typical behavior trait or domain, e.g.' a 
habit, value, or attitude 

Develop a definition of e'ach domain. Be a$ complete as 
possible in the ^ time alldw^d. 

Maximiim performance domain - 10 minutes 

Typical behavior domain - 20 minutes ^ 



Do not limit yourself to paper-and-pencil self-report 
behavior for the typical behavior domain. 



The reporter should be prepared to give a brief definition 
of each domain to the group and to note problems encountered in the 
definition process. 



PERSON AND 5TEM SAMPLING 

"Once .a domain of conditions has been defined and a population of persx)ns 
, specified, two types of questions often appear. The first is whether the 
population of persons can perform adequately on a part'iculat element of the 
measurement domain. This* question is usually answered by estimating the pro- 
portion of the person population which can perform at or above some specified 
level. Thie second question .is to what extent a particular person hap mastered 
the entire measurement domain. This may often be approached by estimating the 
proportion of the conditions within the domain which .the person could perform 
satisfactorily. Both questfpns can therefore be reduced to the estimation of a 
proportion, the proportion of ^^lersons passing a specified ^tem, ^r the proportion 
of items passed by a specified person. 

Estimation theory . points out that in neither case is it necessary to make 
all possible observations to reach an adequate estimate. Therefore, the usual 
practice of selecting, a few conditions (items) and administering them to dll 
students in an innovative program is very often wasteful. When the primary 
•interest is in the first type of question, it might be better to draw two sets 
of items and administer each set to half the students. Each set might be 
equally effective in estimating domain proportions for individuals, but the 
item performance data would be available for twice as many items.' Once the 
habits of traditional proceeiures are broken, the range of possibilities for 
sampling items and persons expands. 

Procedures can be developed from random sampling theory for describing 
how accurate the estimates provided by any particular sampling plan will be. 
Conversely, it is possible to ^specify the desired level of accuracy and find 
the number of it^s and persons which must be sampled. These procedures make 
possible the preparation of an efficient measurement plan for an innovative 
program. 
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TITLE III EVALUATORS TRAINING 
MEASUREMENT UNIT 3 

% * 

Person and Item Sampling 

I. Problem: Describe typical level of performance of 
A* Universe of persons or 
Domain of tasks or items 

# ' • '* ■ ^ 

.II* Subprqblems: Interest may be-in 

A. Estimating proportion of ^pfersons correctly answering 

single item • ' 
B"* Estimatiag proportion of items correctly answered by 

a single person 

Estimating mean and dispersion of the proportion of 
items correctly answered 
•D. Estimatiqg covariation among items 



\ 



III. Need for Sampling and Procedures 
A. Ideal 

* B. Traditional research approach 
C* Nonning approach ^ 
D« Joint sampling 



N 

IV. Example of Various Sampling Techniques 

A. Item domain: 100 one digit multiple factor, paper and 
pencil free response, specified time 

B. Person domain: 500 students in study 

C. 50,000 responses: too great a number 

D. 10,000 response plans 



V. Evaluation of Plans - Done in terms of questions 

A. Item "difficulty" (success of program on specific criteria) 
1. estimate proportion passing single items 



n '■N-I'' 



n = 



Z2p^+(N-I)T2 
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B. Individuals' scores (success of program with individuals) 
1. estimate proportion of items in the domain passed by 

this individual 

C. Estimate mean and variance of population distribution of 
universe scores (what is typical performance)^ 

D. Covariance ^among responses (is. there a pattern of success 
and failure on items) 



VI . SunBoary 

A. Concern is for estimating several parameters 

1. typical performance (mean score) 

2. item probability 

3. person scores 

4. item covariances 

B. Sampling designs for group measurement depend on 

1. desired accuracy of estimation 

2. relative, importance of various objectives 

3. practical limftations 

4. use of random or stratified random sampling of 
available items and persons 

C. Generalization beyond populations sampled is logical 
problem 



\ 



References • 

Thomas R. Knapp, An application of balanced incomplete 
block designs to the estimation of test norms. Educ, 
and Psychol. Me as. , 2a, 265-272, (1968) 

Frederick Lord, Estimating norms by item sampling. 
Educ. and PsychoL Meas. , 22, 259-267. (1962) 

Lynnette B. Plumlee.P Estimating means and standard 
deviations from partial data, Edxxc. and Psychol, 
Meas. , 24, 623-630. (1964) 

H* M. Walker and J. Lev. Statistical Inference . New York: 
Holt, 1953. (esp. pp. 68-72) 



UNIT 3 ACTIVITIES 



Choose a clearly defined measurement domain (such as the 
maximum performance domain from Unit 2 activities) and specify 
the size of the population o| persons available in the sample 
situation selected. Assume all items are scored 1 or 0. 



Work either Activity A or Activity B» 



A. Develop a plan for sampling items and persons, and find the 
accuracy it gives for: 

1. estimating the proportion of persons correctly answering 

a specified item» 
.■2* estimating the proportion of the item domain which could 

be answered correctly by a specified person. 
3." estimating the mean performance for persons on the domain 

of items. • * 

B. Specify the accuracy of estimation desired for: 

1. estimating the propprtiori of persons correctly answering 
an item (use an item with .50 difficulty) arid 

2. estimating the proportion of the item'domain an individual 
would pass (use a 50 percent person). 

Calculate the number of persons per item and items per person 
required and construct a sampling plan. 



What is the accuracy produced for estimating typical performance 
on the domain of items? 



Reporter should report 



1) Which activity was attempted 

2) What plan or tolerances were specified 

3) . What tolerances or plan resulted ^ 

4) Implications of results 
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TEST AND ITEM SELECTION 

In selecting a standardized test or selecting items for a "homemade test 
which will represent some measurement domain, the primary concern is whether 
the items used may be considered ^ reasonable sampling of the domain* A clear 
definition of the domain s dimensions and boundaries wiLl provide the infor- 
mation necessary for a logical analysis of the question* 

Empirical operations can also be helpful in analyzing a set of items by 
highlighting peculiar response patterns which suggest that an item is not de- 

c 

pendent on the ability^ intended, and therefore may not be properly included'as 

sampled from the specified domain* The item may then be discarded or revised, 

A classic example is the item response which has accidentally been incorrectly 
• * 

keyed'. The tendency of students who otherwise perform well to miss this item 

and to choose the option which is actually correct calls the error to the 

. ' \ ^ 

examiner 'e attention. 

■ ¥t 

The inconsistency between item performance and some more general per- 
formance which revealed the miskeyed item illustrates the general nature of 
item analysis. Items in a domain are expected to be homogeneous enough so that 
there is positive covariation between almost any pair of items and certainly 
between any item and the domain scope, a property usually called internal con- 
sistency. Most item analysis proceduires are designed to show a lack of internal 
consistency* 

.Analyzing the nature of the inconsistency by studying the distribution 
of responses over the options of multiple choice items may assist the evaluator 
to find the source of the irregularity* These • techniques provide empirical 
checks on domain' definition and sampling which are helpful to the test con^ 
structor. 



4 



TITLE III EVALUATORS TRAINING 
MEASUREMENT UNIT 4 

Test and Item Sampling 



I, Two types of Achievement Test: 

A, Subjective test - The grader is allowed to extensively 
exercise personal judgement in scoring the test. 

B. Objective test - The grader is permitted littTe, if any, 
freedom o^ personal 'judgement> in scoring the test. 

The present 'discussion will be restricted to type B. 

II. Standardized and Self-Made Achievement Tests 

A. Standardized test - A test for which items have been 
carefully selected and which has been administered to 
various normative groups* 

1. example of standardized tests -'see handout 

2. advantages of standardized tests 

3. references for standardized tests - see handout 

4. considerations in cjioosing a standardized test 

B. Self-made >test - A test constructed for a specific 
purpose and which has not been extensively used. 

1. item writing - see handout 

2. item analysis see handout 



References: , "5? 

N. M,. Downie. Fundamentals of Measurement; Techniques and 

Practices . New York: Oxford University Press, 4967. 
J. R. Gerberich. Specimen Objective Test Items . New York: 

Longmans, Green & 90., 1956, * 
H. A. Greene^ A. N. Jorgensen and J. R. Gerberich, Measure- 

ment and Evaluation in the Secondary School. New York: 

David McKay Co., 1964. 
N. E. Gronlund. M^asureirient and Evaluation in Teaching. 

New York: The MacMillan Co., 1965. ^ 
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Sources of Information About Standajrdized Tests 



• 0. K. Burbs. - Tests in Print. Highland Park, N. J.: «Gryphon Press, 
1961. % ^. ^ 

0, Euros. Mental Measurements Yearbook . Gryphon Press. Published 
periodically. 



Test Published: 



!• American Guidance Service, Inc. 
720 Washington Avenue, S,E, 
Minneapolis, Minnesota 55414 

2. Bureau of Publications 
Teachers College, 
Columbia University 

New York, New York 10027 

3. Galiforuia Test Bureau 
Del Monte Research Park 
Monterey, California 93940 

4. Consulting Psychologists Press, Inc, 
577 College Street 

Palo Aito, California 94306 

5; Cooperative Test Division 
Educational Testing Service 
Princeton, New Jersey 08541 



Harcourt, Brace & World, Inc-, 
757 Third Avenue 
New York, New York 10017, 



10, 



Houghton Mifflin Company 
2 Park Street 

Boston, Massachusetts 02107 



Personnel Press, Inc. 
188 ''Nassau Stireet 
Princeton, New Jersey 08541 

Psychological Corporation 

304 East 45th Street 

New York, New York 10017- . 

Science Research Associates, Inc< 
259 East Erie Street . 
Chicago, Illinois 60611 



/ 



) 



-17- 

TITLE III EVALUATORS TRAINING ' ' 

» 

MEASUREMENT ' • • . . 

Suggfestions for Item Writing 

\. 

General' Suggestions 

T 

1. Express iETie item as clearly as pos'sible. 

2. V Choose words that have precise meaning wherever-possible. % 

3. Avoid complex or awkward word arraitfeements, 

4. Ii^clude all qualifications neefied to provide a reasonable basis for 
response selection. 

5. ^void the Inclusion of cofunctional v^rds in the iCem. 

: Poor ; When sailors put out to sea for long periods of timej 

vitamin. C, in most instances, is added to diets to prevent 
* ' ^ A. beri-beri C. sterility 

^ cretinism + D.^ scurvy 

Bettet : Vitamin C is adderd to diets to prevent * . " 

A. beri-beri C. sterility 

B. ctetinism * +0. sctrtrvy 

6. Avoid unessential specif i^city in the' stem or the responses. 

Poor ; If President Nixon and Vice President Agnew were to die, they' 
would he succeeded by >- •* . , 

+ A. Speaker of tzhe House ^cCormack 
• B^ Chief Justice of the Supreme Court Warren 

LC. Secretaty of State Rogers i ^ ^ 

D. Secretary of Defense Laird . ^ 1 ' * 

er : ^ If the President and Vice President of the United Stites \^ 
^ . were to die, fehey. would be succeeded -by 

+ A. the Speaker of the House ' > , ' 

B. the Chief Justice of the Supreme Court 

C. the Secretary <»f State 

D. the Secretary of Defense 

7. Avoid irrelevant inaccuracies it^ny part- of the^item, 

8. Adapt the level of item difficulty^to the group and purpose for which 
it is intended. 

9. Avoid irrelevant clues to che correct response. * . ' 

Poor ; A test is said to be valid when v 
+ A» it measures what it is supposed to measure 
B# including only multiple-choice Items 

C. reliability is important too 

D. to score it one is objective 

Better ; A test is said .to be valid when » 
+ A. it measures- what it is supposed to measure 

B. it includes only multipie-choice items 

C. it is reliable 

D. it is objective 

10. In order to defeat the rote-learner, avoid stereotyped phraseology in 
the stem or the correct response. 

11. Avoid irrelevant sources of difficulty* 



Sug&a^tlons for Item Writing 
. (Continued) 



Short Answer Form 



1. Use the short-answer form only for questions that can be answered by 



a unique ward, phrase, or number. 

2. Do hot borrow statements verbatim from context and attempt to use them 
as' short-answer items, • ^' , . 

3. Make the questix)n, or the directions, explicit. 

4. Alloy sufficient. space for pupil answers, and grange the spaces for 
convenience in scoring,, 

5. In computational problems .specify the degree of precision expected, or 
better still, arrange the problems to come out even unless the ability 
to handle fractions and decimals i^ being tested, 

6. Avoid overabundance of completion ^exercises. \^ 

The True-False Eorm , , ^ 

1. Base true-false items only on statements which are true .or false with- 
out qu'alif ications. " 

Poor ; It is a short crip from Chicago to Detroit. (T or 'f) 
BQtt^r ; in a. super jet, it is a short trip from Chicago to Detroit. ( 

2. AVoid the use of Iqpg and involved statements with many qualifying • 
phrases. ^ ' * 

Poor ; If the Presidc?Qt were tb die and if tl^e Vice President were 
to assume command and then also die, the Speaker of the 
' Hoiise would become President. (I) 

■ Better : If the President and Vice President both die, the Speaker 
of the House becomes President. CT) 

3. Avoid the use, of sentences borrowed from texts or other sources as true 
false items. 

Multiple-Choice Form 

1. Use either a direct question-^r an incomplete statement as the item 
stem' . ' , 0 * 

y P^or; Charles Darwin /' I. 

A» was a renowned chemist 
+ Bl forraulated^^a theory of evolution 

C. dlscovcgred the proton * , ^ 

D. proved the Central Limit Theorem S 
Better; Charles Darwin was a ' 

A. chemist 

+ B. naturalist * * I 

C. \ physicist 

D. statistician 

2. In general, include in the stem any words that;/ must otherwise -be re- 
peated in each response. ^ , 

Poor; O^ie of the major functions of the adrenal-^gland is 
+ A. to .regulate the amount of sugars in the bTood 

B. to regulate the airiount of proteinL sent to body cells 
€. to regulate the secretion of wastes 

D. to regulate the amount of insulin. 



3. 
4. 
5. 
6. 



7, 
8. 
9. 



10. 
11. 

12. 



Suggestions for Item Writing 
(Continued) . »' * 



/ 



Bettfer: One of the*major functions of the adrenal gland is to 
regulate the ^ 
amount of sugars in the. blood 
amount of protein sent to body cells 
secretion of wastes 
secretion of insulin 
avoid a negatively stated item stem. 
Provide a response that competent critics can agree is the best. 
Make all' the responses appropriate to the item stem. , 
Make all distracters plainriM^ and attractive to examinees who lack 
the information or ability t^ted 6y the item.' 

Poor: The area of a circle with a. diameter' equal to 12 is ^ 



+ A 
B. 
C. 
D. 

If possible. 



113 (using ?rr^ ) 
453 (using ) 



r 



approximatfely 

X. 19 (using Trr) C. 

B. 38 (using ird) D. 
Avoid* highly technical distracters. 
Avoid responses that overlap or include each other. I 
Use "none of these" as a response only in- terms to which an absolutely 
"correct answer can be given; use it as an obvious answer several times 
early in the test but use it sparingly thereafter; avoid using^t as 
the ansvapr to items in which tt may cover a ferge" number of incorrect 
responses. - 

Arrange the respo^ises in logical order, if one exists, out avoid con- 
sistent preference for any particular response position. 
If the item deals with the definition of a term. It is often preferable 
to include the term in the stem and present alter.rtative definitions 
in the responses. . , , • 

Do not -present a collection of true-false. Statements as a multiple- 
choice item. * • 



Matching Exercises , ^ 

1. G^up only homogeneous premises .and homog'efieous responses in a single 
matching item. 



Poor: 



EX 



Better: 



2. statistician 

3. I(X-XOVn 

2. 



2 



E(X^X)2. 



A: standard deviation 

B. mean • . 

C* Samuel Wilks 

D. variance 

> A. standard deviation 

. B/ kniean 

C' 'Standard score 

• D. variance . 
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' Suggestions for "Item Writing 
\ ^ ^ ^ (Continued) 



2. Use relatively short lists of responses. 

3. Arrange premises. and responses for maximum clarity and convenience ' 
to the examinee. 

4. The directions should clearly explain the intended basis for 
matching. y 

5. Do not attempt to provide perfect one-to-one matching between premises' 
and responses (more responses than premisres). 
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Item Analysis. Formulae 



1. DIFFICULTY OF ITEM i 



X 100 



= the number of persons answering item i correctly 
Tj^ = the total number of persons who responded to item i J 



2. DISCRIMINATING POWER OF ITEM i = 



c - C 

_Ui 'Li = D 



T./2 



C = the number of persons scoring in the upper half on the test and 

who answer item i correctly. 
C^j^ = the number of persons scoring in the lower half on the test 3nd 

who answer item i correctly. 
T£ = the total number of persons who responded to item i 

k 



TEST RELIABILITY = KR20 = —r [1 

k-1 ^ 



1= 



iPi^i 



k = total number of items on the test* 

p^^ ~ _ proportion of persons responding correctly to item i. 

(C. = number of persons answering item i correctly; T h total 
number of persons taking the test) 

= i-Pi ' 

T - 2 

, , , = variance of the total test scores 



Exercise A group of 100 persons; took a four-item test; a'nd the following 
outcomes were observed. 



Item No. 


No. Correct 
in Upper Half 


No. Correct 
in Lower Half 


Total No. 
Responding 


1 


50 


0 


100 


2 


15 


5 


100 


3 


50 


50 


100 


4 


10 


40 


100 



The mean and variance of the test were determined to be 2.20 and 1*32 re- 
spectively. Determine : 



(1) the difficulty of each- item. Which items ar^ too eas> and which are 
too difficult? 

(2) the discriminating power of each item. Which items are good dis- 
criminators and which are not? * . - 

(3) Is the test highly reliable or riot? 



Answers to Exercise 





No. Correct 


No. Correct 


Total No. 


Item No. 


in Upper Half 


in Lower Half 


Responding 










1 


50 


0 


100 


• 2 


15 


5 


100 


3 


50 


50 


100 


4. 


10 


40 


100 



X = 2.20 
= 1.32 



(1) 



i 


C 

i 


T. 
1 


1 


50 


+ 0 = 


50 


100 


• 2 


15 


+ 5 = 


20 


100 


3 


50 


+ 50 = 


100 


, 100 


4 


10 


+ 40 = 


50 


" 100 



Difficulty 


of 


1 


= 50 
100 


X 


100 


= 50% 


DifficulJ-y 


of 


2 


= 20 
100 


X 


100 


= 20% (too difficult) 


Difficulty 


of 


3 


= 100 
100 


X 


100 


= 100% (too easy) 


Difficulty 


of 


4 


= 50 
100 


X 


100 


= 30% 

« 



I 



(2) 



i 


C 

Ui 


C 

Li 


^"Ui" "Li 


T. 
1 


T /2 
1 


1 


50 


0 


50 


100 


50 


2 


15 


5 


10 


100 


50 


3> 


50 


50 


" 0. 


100 


50 


4 


10 


40 


' -30 


, 100 


50 





= 50/50 = 


1.00 
Jt 


(good discriminator) 


°2 = 


= 10/50 = 

« 


.20 


(weak discriminator) 


D3 = 


= 0/50 = 


.00 


(does not discriminate) 


D4 ' 


= -30/50 = , 


-.60 


(negative discriminator) 
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(3) 



i 








1 


(50 + 0)/100 


= .50 


.50 


.2500 


2 


(15 + 5)/100 


= .20 


.80 


.1600 


3 


(50 + 50)/ 100 


= 1.00 


..00 


.0000 


4' 


(10 + 40)/100 


= .50 


.50 


.2500 



KR-20 = 



Jl. b. 111 ] 

k-l s? 



4 

.6600 = I PiQi 
i=l 



_ 4 h . .66 ] 
3 1.32 



i [1 - i] 
3 2 



ix 1 . 
3 2 



2 : .67 (not highly reliable) 
3 



OBJECTIVE OBSERVATION 

» 

The concern for meaningful measurements often suggests the use of tech- 
niques other than the typical pencil and paper test. A broad and useful class 
of data gathering procedures is objective observation. There are, however, many 
ways error may creep into such measurements, many possible sources of "slippage" 
between the raw input to the observer and the recording of a number or symbol 
representing that observation. This problem has recently been studied by 
Webb, et. al. (1966) who have emphasized two* qualities of measurements which 
help achieve the general ideals of meaningfulness and precision. 

The first is nonreactivity, a quality achieved .when the measurement 
process does not affect the thing being measured. A common problem is that 
reactive measurements often become an important part of the treatment. On the 
other hand, a reactive measure may produce a temporary effect, making a meaning- 
ful measurement impossible.^ The effect of observers in small groups is an 
obvious example. Allqwing adaptation periods and undetectable obsexrvation are 
two techniques for countering reactivity. The latter, of course, raises questions 
of ethics. 

The second quality is consistency of calibration, a property existing 
when the same phenomenon observed twice will produce the same measurement. The 
tendency of participant observers to notice certain things when- they first join 
a new culture, and other things after they have observed for some time, illustrates 
inconsistency of calibration.* A more common illustration is provided by the 
decrease of alertness resulting from fatigue during a series of consecutive 
observations. Training, simplicity and clear definition of procedures, and 
attention to physical limitations help keep calibration consistent. 

A final -concern not emphasized by Webb is the need for reasonable sampling 
of the behavior domain. The risk of using a single behavior to represent a 
domain is an instance of generalizing from one case; 



TITLE IV EVALUATORS TRAINING 
^MEASUREMENT UNIT 5 



Objective Observation 



I. Stages 

A, Collection 
^ B. Storage 

C. Reduction 

D. Storage 

E. Summarization 

F. ' Storage 

Reporting 



II. Some Issues 

A. Should collection and reduction be combined? 

B. How may selectivity be controlled? 



III. Ideals 

A. Consistency of calibration 
B • Nonr eac tlvenes s 

C. Unbiased sampling of domain of conditions 



IV. Reduction of Information: 

A. Accuracy (objectivity) 

B. Meanlngfulness 



V. Storage: Problems and process 

A. Files: liquor cabinet vs. cemetery 

B. Coded data 

C. Housekeeping vs. housecleanlng 

D. Written reports 

VI. Summary 

A. Major processes *- observation, reduction, storage 

B. Objectives - accuracy and meanlngfulness 

C. Techniques 



Reference: 



J. W. Webb, D. T. Campbell, R. D* Schwartz and L. Sechrest. 
Unobtrusive Measures . Chicago: Rand McNally. 1966. 
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UNIT 5 ACTIVITY 



Choose one of the characteristics, behaviors, etc. 
suggested in the Unit 1 Activities (perhaps ,the typical per- 
formance domain analyzed in Unit 2). Suggest as many ways 
in which the behaviors of the domain could be observed as you 
can in 20 minutes. In the final 10 minutes, 'analyze each 
observation procedure for (1) nonreactiven^ss and (2) con- 
sistency of calibration. 

Be free with suggestions during the first phase. 

The reporter should select a few of the procedures for 
presentation on the basis of creativity, quality, or interesting 
problems presented. 



ERLC 



\ 
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SAMPLE siimnoN A 



Prekindergarten Program for Disadvantaged Children 

The primary purpose of this program is to increase verbal communication 
skills and broaden the children's range of experience. Ninety 4-year-old 
children will be selected from a depressed area of the city. They will spend 
^ day, 5 days a week at the center, for 1 school year. The curricula will be 
planned by three teachers and a developmental psychologist available 1 day a 
week. The teachers will be assisted by the equivalent of six full-time persons 
recruited from the childrens* parents. It is believed that participation by a 
parent may have a substantial effect on the home environment. The program will 
be conducted in the basement of the Methodist church. Available are two office- 
sized rooms, two slightly larger rooms, and a large open area. Desired equip- 
ment will be provided by project or community funds. 



SAMPLE SITUATION B 

r 

Introduction of Teacher Aides 

The primary objective of this program is to provisle the teacher more 
teaching time by assigning nonteaching^ tasks to teacher aides. It is assumed^ 
that achievement of the objective will result in improved student achievement 
and teacher morale. Twenty 4t:h grade classrooms will participate-. An^aide . 
will be available to each teac^ .-.'during all school hours. After orientation 
by the central unit, each aide will be assigned to a teacher, to do whatever 
the teacher asks. The central unit will provide short training sessions as 
requested by the teachers or aides. The aides must have some .college education. 
The setting is suburban. 
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SAMPLE SITUATION C 



Individualizing Instruction Through Computer Based Resource Units 

The primary objective of this program is to increase individualization 
•of instruction hy providing lists of materials, activities, and projects 
apihropriate to the teacher's objectives and the child's individual character- 
istics. Twenty ^llth grade social studies classes will participate. For each 
class the teacher will choose from a list xjf objectives and record each child's 
individual characteristics on a check sheet. Abilities, interests, and' back- 
ground factors are included.. From the computer-stored unit, the to^^ibr will 

receive lists of group activities and individual lists of resources and 

« 

activities for' each child. Each teacKer will use three such units during a^ 
single semester. No special provision of materials will be made. It is 
expected that successful individualization will result in improved student 
interest and achievement and a feeling of productivity in the teachers. 
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• SAMPLE SimTION D 

Improving Interracial Attitudes and Knowledge 

This is a two-pronged study to be conducted in the 8th* grades of four 
sdhool districts* The objective is. to insure knowledge of the Negro's 
contribution to past and present societies, and to produce favorable attitudes 
toward other groups* Each. of the schools is about 50 percent white. Prep- 
arations will be made during the fall semester and activities conducted during 
the spring. 

A panel of teachers, augmented by a curriculum specialist, a Negro 
leader, and a full-t^e clerical worker vi.ll collect mateipials and activities 
relevant to the units normally taught in 8th grade, stressing the contributions 
of Negroes and the Negro community. ■ The widest possible range of subject matters 
will be covered. The panel will also suggest ways the special materials can 
be integrated into the usual unit presentation. All teachers will use at least 

' • • • ' \ 

some of the materials. ' ' / 

The second prong consists of an interested university group training 
« » 

teachers in techniques for changiag attitudes* Procedutes relevant to each 
major subject will be provided. Procedures for altering both whites' attitudes 
toward blacks and blacks* attitudes toward whites will be supplied. Each, 
major subject teacher agrees to use {wo of the provided. attitude change routines 
during the second semester. 




